大型语言模型开发的最新进展导致公众访问最先进的预训练的语言模型(PLM),包括生成培训的预训练的变压器3(GPT-3)(GPT-3)和Transformers(来自Transformers)的双向编码器(伯特)。但是,实际上,对PLM的评估表明,在培训和开发的微调阶段,它们对对抗性攻击的敏感性。这种攻击可能导致错误的输出,模型生成的仇恨言论以及用户敏感信息的暴露。尽管现有的研究集中在PLM的培训或微调期间的对抗攻击上,但有关这两个发展阶段之间攻击的信息不足。在这项工作中,我们重点介绍了GPT-3公开发行的主要安全漏洞,并进一步研究了其他最先进的PLM中的这种漏洞。我们将工作限制在没有经过微调的预培训模型中。此外,我们强调了令牌距离最小化的扰动作为一种有效的对抗方法,绕过受监督和无监督的质量措施。遵循这种方法,在评估语义相似性时,我们观察到文本分类质量的显着降低。
translated by 谷歌翻译
Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). Inspired by coordinate inputs of previous neural representation methods, we assign a coordinate to each convolutional kernel in our network based on its position in the architecture, and optimize a predictor network to map coordinates to their corresponding weights. Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, since slight perturbations in pre-trained model weights can result in a considerable accuracy loss, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet. Finally, we present two applications using NeRN, demonstrating the capabilities of the learned representations.
translated by 谷歌翻译
A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/
translated by 谷歌翻译
Micron-scale robots (ubots) have recently shown great promise for emerging medical applications, and accurate control of ubots is a critical next step to deploying them in real systems. In this work, we develop the idea of a nonlinear mismatch controller to compensate for the mismatch between the disturbed unicycle model of a rolling ubot and trajectory data collected during an experiment. We exploit the differential flatness property of the rolling ubot model to generate a mapping from the desired state trajectory to nominal control actions. Due to model mismatch and parameter estimation error, the nominal control actions will not exactly reproduce the desired state trajectory. We employ a Gaussian Process (GP) to learn the model mismatch as a function of the desired control actions, and correct the nominal control actions using a least-squares optimization. We demonstrate the performance of our online learning algorithm in simulation, where we show that the model mismatch makes some desired states unreachable. Finally, we validate our approach in an experiment and show that the error metrics are reduced by up to 40%.
translated by 谷歌翻译
A master face is a face image that passes face-based identity authentication for a high percentage of the population. These faces can be used to impersonate, with a high probability of success, any user, without having access to any user information. We optimize these faces for 2D and 3D face verification models, by using an evolutionary algorithm in the latent embedding space of the StyleGAN face generator. For 2D face verification, multiple evolutionary strategies are compared, and we propose a novel approach that employs a neural network to direct the search toward promising samples, without adding fitness evaluations. The results we present demonstrate that it is possible to obtain a considerable coverage of the identities in the LFW or RFW datasets with less than 10 master faces, for six leading deep face recognition systems. In 3D, we generate faces using the 2D StyleGAN2 generator and predict a 3D structure using a deep 3D face reconstruction network. When employing two different 3D face recognition systems, we are able to obtain a coverage of 40%-50%. Additionally, we present the generation of paired 2D RGB and 3D master faces, which simultaneously match 2D and 3D models with high impersonation rates.
translated by 谷歌翻译
The use of needles to access sites within organs is fundamental to many interventional medical procedures both for diagnosis and treatment. Safe and accurate navigation of a needle through living tissue to an intra-tissue target is currently often challenging or infeasible due to the presence of anatomical obstacles in the tissue, high levels of uncertainty, and natural tissue motion (e.g., due to breathing). Medical robots capable of automating needle-based procedures in vivo have the potential to overcome these challenges and enable an enhanced level of patient care and safety. In this paper, we show the first medical robot that autonomously navigates a needle inside living tissue around anatomical obstacles to an intra-tissue target. Our system leverages an aiming device and a laser-patterned highly flexible steerable needle, a type of needle capable of maneuvering along curvilinear trajectories to avoid obstacles. The autonomous robot accounts for anatomical obstacles and uncertainty in living tissue/needle interaction with replanning and control and accounts for respiratory motion by defining safe insertion time windows during the breathing cycle. We apply the system to lung biopsy, which is critical in the diagnosis of lung cancer, the leading cause of cancer-related death in the United States. We demonstrate successful performance of our system in multiple in vivo porcine studies and also demonstrate that our approach leveraging autonomous needle steering outperforms a standard manual clinical technique for lung nodule access.
translated by 谷歌翻译
The field of emergent communication aims to understand the characteristics of communication as it emerges from artificial agents solving tasks that require information exchange. Communication with discrete messages is considered a desired characteristic, for both scientific and applied reasons. However, training a multi-agent system with discrete communication is not straightforward, requiring either reinforcement learning algorithms or relaxing the discreteness requirement via a continuous approximation such as the Gumbel-softmax. Both these solutions result in poor performance compared to fully continuous communication. In this work, we propose an alternative approach to achieve discrete communication -- quantization of communicated messages. Using message quantization allows us to train the model end-to-end, achieving superior performance in multiple setups. Moreover, quantization is a natural framework that runs the gamut from continuous to discrete communication. Thus, it sets the ground for a broader view of multi-agent communication in the deep learning era.
translated by 谷歌翻译
在视频分析中,背景模型具有许多应用,例如背景/前景分离,变更检测,异常检测,跟踪等。但是,尽管在静态相机捕获的视频中学习这种模型是一项公认的任务,但在移动相机背景模型(MCBM)的情况下,由于算法和可伸缩性挑战,成功率更加重要。由于相机运动而产生。因此,现有的MCBM在其范围和受支持的摄像头类型的限制中受到限制。这些障碍还阻碍了基于深度学习(DL)的端到端解决方案的这项无监督的任务。此外,现有的MCBM通常会在典型的大型全景图像或以在线方式的域名上建模背景。不幸的是,前者造成了几个问题,包括可扩展性差,而后者则阻止了对摄像机重新审视场景先前看到部分的案例的识别和利用。本文提出了一种称为DEEPMCBM的新方法,该方法消除了上述所有问题并实现最新结果。具体而言,首先,我们确定与一般和DL设置的视频帧联合对齐相关的困难。接下来,我们提出了一种新的联合一致性策略,使我们可以使用具有正则化的空间变压器网,也不是任何形式的专业化(且不差异)的初始化。再加上在不破坏的稳健中央矩(从关节对齐中获得)的自动编码器,这产生了一个无端到端的无端正规化MCBM,该MCBM支持广泛的摄像机运动并优雅地缩放。我们在各种视频上展示了DEEPMCBM的实用程序,包括超出其他方法范围的视频。我们的代码可在https://github.com/bgu-cs-vil/deepmcbm上找到。
translated by 谷歌翻译
本文提出了2022年访问量的挑战的最终结果。 OOV竞赛介绍了一个重要方面,而光学角色识别(OCR)模型通常不会研究,即,在培训时对看不见的场景文本实例的识别。竞赛编制了包含326,385张图像的公共场景文本数据集的集合,其中包含4,864,405个场景文本实例,从而涵盖了广泛的数据分布。形成了一个新的独立验证和测试集,其中包括在训练时出词汇量不超出词汇的场景文本实例。竞争是在两项任务中进行的,分别是端到端和裁剪的文本识别。介绍了基线和不同参与者的结果的详尽分析。有趣的是,在新研究的设置下,当前的最新模型显示出显着的性能差距。我们得出的结论是,在此挑战中提出的OOV数据集将是要探索的重要领域,以开发场景文本模型,以实现更健壮和广义的预测。
translated by 谷歌翻译
自2016年成立以来,Alexa奖计划使数百名大学生能够通过Socialbot Grand Challenge探索和竞争以发展对话代理商。挑战的目的是建立能够与人类在流行主题上连贯而诱人的代理人20分钟,同时达到至少4.0/5.0的平均评分。但是,由于对话代理商试图帮助用户完成日益复杂的任务,因此需要新的对话AI技术和评估平台。成立于2021年的Alexa奖Taskbot Challenge建立在Socialbot Challenge的成功基础上,通过引入交互式协助人类进行现实世界烹饪和做自己动手做的任务的要求,同时同时使用语音和视觉方式。这项挑战要求TaskBots识别和理解用户的需求,识别和集成任务和域知识,并开发新的方式,不分散用户的注意力,而不必分散他们的任务,以及其他挑战。本文概述了Taskbot挑战赛,描述了使用Cobot Toolkit提供给团队提供的基础架构支持,并总结了参与团队以克服研究挑战所采取的方法。最后,它分析了比赛第一年的竞争任务机器人的性能。
translated by 谷歌翻译